An empirical Bayes mixture model for SNP detection in pooled sequencing data

نویسنده

  • Baiyu Zhou
چکیده

MOTIVATION Detecting single-nucleotide polymorphism (SNP) in pooled sequencing data is more challenging than in individual sequencing because of sampling variations across pools. To effectively differentiate SNP signal from sequencing error, appropriate estimation of the sequencing error is necessary. In this article, we propose an empirical Bayes mixture (EBM) model for SNP detection and allele frequency estimation in pooled sequencing data. RESULTS The proposed model reliably learns the error distribution by pooling information across pools and genomic positions. In addition, the proposed EBM model builds in characteristics unique to the pooled sequencing data, boosting the sensitivity of SNP detection. For large-scale inference in SNP detection, the EBM model provides a flexible and robust way for estimation and control of local false discovery rate. We demonstrate the performance of the proposed method through simulation studies and real data application. AVAILABILITY Implementation of this method is available at https://sites.google.com/site/zhouby98.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EMPIRICAL BAYES ANALYSIS OF TWO-FACTOR EXPERIMENTS UNDER INVERSE GAUSSIAN MODEL

A two-factor experiment with interaction between factors wherein observations follow an Inverse Gaussian model is considered. Analysis of the experiment is approached via an empirical Bayes procedure. The conjugate family of prior distributions is considered. Bayes and empirical Bayes estimators are derived. Application of the procedure is illustrated on a data set, which has previously been an...

متن کامل

A cross-sample statistical model for SNP detection in short-read sequencing data

Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling informati...

متن کامل

Invariant Empirical Bayes Confidence Interval for Mean Vector of Normal Distribution and its Generalization for Exponential Family

Based on a given Bayesian model of multivariate normal with  known variance matrix we will find an empirical Bayes confidence interval for the mean vector components which have normal distribution. We will find this empirical Bayes confidence interval as a conditional form on ancillary statistic. In both cases (i.e.  conditional and unconditional empirical Bayes confidence interval), the empiri...

متن کامل

Empirical Bayes Estimation in Nonstationary Markov chains

Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical  Bayes estimators  for the transition probability  matrix of a finite nonstationary  Markov chain. The data are assumed to be of  a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...

متن کامل

Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective

Abstract. Empirical Bayes methods for Gaussian and binomial compound decision problems involving longitudinal data are considered. A new convex optimization formulation of the nonparametric (Kiefer-Wolfowitz) maximum likelihood estimator for mixture models is used to construct nonparametric Bayes rules for compound decisions. The methods are illustrated with some simulation examples as well as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 28 20  شماره 

صفحات  -

تاریخ انتشار 2012